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1  Introduction 


Clustering  is  an  important  technique  used  in  discovering  the  inherent 
structure  present  in  the  set  of  objects.  Clustering  algorithms  attempt  to  or¬ 
ganize  unlabelled  pattern  vectors  into  clusters  or  natural  groups  such  that 
the  points  within  a  cluster  are  more  similar  to  each  other  than  to  points  be¬ 
longing  to  different  clusters.  Vector  Quantization  refers  to  representing  a 
set  of  vectors  by  a  small  number  of  vectors.  Thus  the  problem  of  clustering, 
in  some  cases,  may  be  taken  to  be  a  problem  of  finding  vector  quantization 
of  data  set [3].  This  is  stated  below  elaborately. 


Let  the  set  of  patterns  X  be  {xi,x2,  .,xn}  where  xt  is  the  ith 

pattern  vector,  X  C  7 ZP,X  is  finite.  Let  the  number  of  clusters  be  k .  If  the 
clusters  are  represented  by  C\,  0%,  . . . ,  Ck  then 

i)  Ci  ^  <f>  for  i  =  1, . . k 

ii)  Ci  fl  Cj  —  4>  i  ±  j 

iii)  li=i  Ci  =  X 

where  4>  represents  null  set  and  k  >  2[2]. 

An  optimization  function  for  clustering  is  the  minimization  of  the  sum  of 
squares  of  within  cluster  distances,  i.e.: 

EE  il*-''*n2 

«=1  x€C, 

where  x  is  the  input  vector.  The  set  of  vectors  V  =  (vi,  v2, . . . ,  v*)  is  called 
the  codebook,  v,-  is  the  representative  vector  for  class  Ct-.  Thus  the  cluster 
C{  is  quantized  by  the  vector  v,.  The  process  of  designing  the  codebook  is 
called  Vector  Quantization.  Many  techniques  of  vector  quantization  use 
clustering  approach[9]. 

Neural  Networks  have  been  employed  in  many  clustering  problems. 
Among  the  existing  models,  the  Self  Organizing  Feature  Map  of  Koho- 
nen  [4,11]  finds  the  topological  structure  hidden  in  the  input  data.  Koho- 
nen’s  Learning  Vector  Quantizer  (LVQ)  [10]  network  can  perform  clus¬ 
tering  when  the  number  of  clusters  present  in  the  data  set  is  known  apriori. 

In  the  following  sections,  we  deal  with  unsupervised  pattern  classifica¬ 
tion  using  neural  network  approach.  In  the  unsupervised  algorithms,  no 
information  concerning  the  correct  class  is  provided  to  the  nets.  Each  new 
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pattern  is  presented  only  once  and  the  weights  are  modified  after  each  pre¬ 
sentation.  We  assume  everywhere  that  the  number  of  clusters,  k  is  known 
apriori.  The  learning  process  used  is  known  as  Competitive  Learning 
Scheme.  The  basic  idea  underlying  what  is  called  competitive  learning  is 
roughly  as  follows: 

Initially,  taking  into  consideration  the  set  X,  consider  a  sequence  of  vec¬ 
tors  x<  where  xn+i  =  x1,...,x2n  =  xn,X2n+i  =  xi  ,  and  so  on.  Here  t  is 
the  time  coordinate,  A  set  of  variable  reference  vectors 
{vi,t  '  v*>t  E  W,  i  =  1,2,  are  also  taken  where  v^o’s  have  been 

initialized  in  some  proper  way(random  selection  will  suffice).  If  x<  is  com¬ 
pared  with  each  v!jt_i  at  each  successive  instant  1  (taken  here  to  be  t  = 
1,2,...)  then  the  best  matching  is  obtained  by  some  distance  mea¬ 

sure  d(xt,  Vitt-i).  If  i  =  c  be  the  best  matching  reference  vector  then 

d(x  t,vc,t_i)  =  mind(xt,Vi,t_i) 

l 

Then  vCj*_i  is  to  be  updated,  so  that  it  moves  closer  to  x*.  In  the  neural 
network  model  the  neigbouring  cells  in  the  output  layer  compete  in  their  ac¬ 
tivities  such  that  in  the  process,  vectors  tend  to  become  specifically  ’’tuned” 
to  different  domains  of  the  input  variable  x  [5]. 

2  Existing  Algorithms 

2.1  Learning  Vector  Quantization 

In  the  VQ  the  objective  has  been  to  find  vectors  vj,  v2, . . . ,  VA s  (k  >  2) 
such  that, 

/(X’V0 

x€X 

is  minimized.  Here  vt-  is  closest  to  x  and  /(x,  vt)  is  a  function  of  the  distance 
between  x  and  v,-.  In  the  neural  network  based  LVQ  models,  where  at  any 
single  instant  only  one  input  vector  x  from  X  is  under  consideration,  the 
researchers  have  tried  to  achieve  the  stated  aim  by  modifying  the  vectors 
vi?  v2> . . . ,  va  at  each  instant  taking  help  of  /(x,  v,)  [where  v,-  is  the  winning 
prototype  for  x]  and  /(x,  vr)  [where  vr  is  the  non-winning  prototype  for  x]. 

LVQ  has  been  associated  in  literature  with  a  neural  network  architecture 
that  has  been  shown  in  Fig.l  .  If  X  =  {xi,x2,...,xn}  C  Tlp  denotes  the 
unlabelled  data,  k  denotes  the  number  of  clusters,  then  the  input  layer  of 
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the  network  contains  p  nodes  and  the  output  layer  contains  k  nodes.  The 
input  layer  is  connected  directly  to  the  competition  layer  or  the  output  layer. 
The  «th  node  in  the  output  layer  is  associated  with  a  weight  vector  vt.  The 
p  components  {vj,’s,  j  =  1,  ...,p  of  v,  are  often  regarded  as  weights  or 
connection  strengths  of  the  edges  that  connect  the  p  inputs  to  the  node  i. 

The  prototypes  V  =  (vj,  v2, . . v*),  v,-  e  Up  for  1  <  *  <  k  are  the 
unknown  vector  quantizers  we  seek.  In  this  context  learning  refers  to  finding 
values  for  {vj,}’s.  When  an  input  vector  x  is  submitted  to  this  network, 
distances  are  computed  between  x  and  each  v,.  The  output  node  i  in  the 
output  layer  is  the  distance  between  x  and  v;.  The  output  nodes  compete, 
a  winner  node  (minimum  distance)  say  c  is  found,  and  the  corresponding  vc 
is  then  updated  using  an  update  rule. 


The  LVQ  algorithm  is  given  below: 


step  1.  Given  unlabelled  data  set  X  =  {xi,x2, . .  .,xn}  C  W>  and  number 
of  clusters  k, 

Fix  N  =  maximum  number  of  updating  steps,  and  c  >  0  where  e 
is  the  termination  condition. 


step  2.  Initialize  Vo  =  (vi,o, . . v^o)  where  each  vt(o  €  7ZP,  and  learning 
rate  a0  €  (0, 1). 

For  t  =  1,2,  ...,7V: 

For  j  =  1,2, . . .,  n 

a.  Find 

min  ||xj  —  v,,t_i||  (1) 

l 

Let  vc  j_i  be  such  that  ||xj  —  vCii_!  ||  =  mm;||xj  —  Vj)t_i|| 

b.  Update  the  winner  vCjt_i: 

Vc,t  =  VC)i_!  +  -  vCit_i)  (2) 
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Next  j 


step  3,  Compute 


V  k 

Et  =  v^<-ii 


/=!  T-l 


step  4.  if  Et  <  e  stop;  Else  adjust  learning  rate  <—  a0(l  —  t/N). 
Next  t. 


step  5.  For  each  x  in  X  if 

||x  -  vCif_j  ||  =  min  ||x  -  vj>t_i|| 

J 

and  mark  x  with  label  c 


The  update  scheme  used  for  modifying  the  winner  has  a  simple  geometric 
interpretation  which  is  shown  in  Fig. 2  . 

The  winning  prototype  vc,t- 1  is  moved  along  the  vector  (xj  -vCyt_i) 
towards  Xj.  The  amount  by  which  vc,t-i  is  shifted  to  arrive  at  vCyt  depends 
on  the  value  of  the  learning  rate  parameter  at-i  where  at  E  [0, 1). 


2.1.1  Init  ializat  ion 

Vo  =  (vi?o>  V2,o?  •  •  •?  vA;,o)  €  have  to  be  initialized.  There  are  sev¬ 
eral  initialization  schemes.  An  initialization  scheme  used  in  the  existing 
algorithms,  and  also  in  the  proposed  method  is  described  below: 

For  data  set  X  =  {xi,X2  . .  .  ,xn}  C  7 Zp.  Let  data  point  q  and  the  initial 
prototype  i  be  xq  =  (xig,X2  and  V;  =  (v\ i,V2i  • .  .vpi)f  respec¬ 

tively.  Compute  the  feature  ranges  : 

Minimum  of  feature 

j  :  rrij  =  min{ij9}  :  j  =  l,2,...,p  (3) 

Maximum  of  feature 

j  :  Mj  =  max{i„}  :  j  =  1,2,..., p  (4) 
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with  this  compute  the  j th  component  of  the  ith.  initial  prototype  vji  as: 

Vji  =  mj  +  (i-  l)(MLZjnL)  i  —  1,2, . A;  j  =  1,2, . . .,p  (5) 

Formula  (4)  disperses  initial  prototype  values  uniformly  along  each  feature 
range. 


2.2  GLVQ 

LVQ  attempts  to  minimize  an  objective  function  that  places  all  its  em¬ 
phasis  on  the  winning  prortotype  for  each  data  point.  This  is  reflected  in 
eqn.(2)  which  alters  only  the  winner.  This,  however  ignores  global  infor¬ 
mation  about  the  geometric  structure  of  the  data  that  is  represented  in 
the  remaining  (k  —  1)  distances  from  x  to  the  non- winner  prototypes.  In 
this  section  Generalized  Learning  Vector  Quantization  algorithm  is 
discussed.  The  algorithm  is  associated  with  the  same  neural  network  archi¬ 
tecture,  where  the  feature  vectors  x  provide  the  inputs  to  the  map  and  the 
weight  vectors  play  the  role  of  the  prototypes  v*.  The  learning  rule  associ¬ 
ated  with  GLVQ  is  obtained  by  minimizing  a  cost  function  which  measures 
a  locally  weighted  error  of  the  input  with  respect  to  the  winning  prototype. 
Mathematically  this  is  explained  below. 

Let  x  £  be  an  input  vector.  Let  J  be  the  cost  function  which 
measures  the  locally  weighted  mismatch  of  x  with  respect  to  the  winner. 

k 

J(x;v =  ]T  0r||x-vr||2  (6) 

T  —  1 


{1  if  *  =  mint  ||x  -  v*||  ) 

— * - —  otherwise  >  1  <  i  <  k  (7) 

E,=i  11*-^  II2  J 

Where  X  =  {xi,. .  .,x„}  is  a  set  of  inputs.  The  objective  of  the  GLVQ 
is  to  find  a  set  of  k  vr’s,  say  V  =  {vr},  such  that  the  locally  weighted  error 
functional  J  is  minimized. 

The  update  rules  for  solving  eqn.(3)  based  on  minimization  of  J  [6]  are: 
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for  winner  prototype  i, 


Vi,t  =  Vi,t- 1  +  <*<- 1  ( 


D2  -D+  ||x- 


D 2 


vM-i  \  ,  ^ 

— - )  (x-  vM_i) 


(8a) 


and  for  non- winner  node  j, 


,  jx-  %  ,  \  £  •  in  ,  .  /  . 

Vj,t  =  vi,«- 1  +  <*t-i  ( - pi - )  (x-v^.i)  for  j  =  1,2,  ...,Ar  j  #  t 


(86) 


where  x  is  the  current  input  vector  and 

D  =  Eiix-VM-ili2 

r— 1 

Tables  1  and  2  show  the  resultof  T  =  500  iterations  of  GLVQ  with  the 
initial  learning  rate  ao  =  0.6  on  IRIS  and  IRIS/10  respectively,  where  in 
IRIS/10  the  feature  vectors  of  IRIS  are  scaled  by  a  factor  of  10. 

The  results  show  that  the  GLVQ  algorithm  doesn’t  work  properly  for  the 
normalized  data.  For  some  scaling  of  data,  it  may  happen  that  the  change  in 
the  winner  prototype  is  less  than  changes  that  are  made  to  the  other  (k  -  1) 
prototypes.  So  the  non-winner  prototypes  will  be  pulled  towards  the  data 
more  strongly  than  the  winner  prototypes.  This  results  in  all  prototypes 
migrating  to  the  same  point  in  7 lp,  as  they  did  for  IRIS/10  [6], [7]. 


2.3  Fuzzy  algorithms  for  learning  vector  quantization 

The  algorithms  are  based  on  the  minimization  of  a  fuzzy  objective  func¬ 
tion,  formed  as  the  weighted  sum  of  the  squared  euclidean  distances  between 
each  input  vector  and  the  prototypes.  Assuming  that  x  is  the  input  vector, 
Vi  is  the  winning  prototype,  and  k  is  the  number  of  clusters,  the  update 
equation  for  the  prototypes  can  be  derived  by  minimizing, 

k 

J  =  ^2  Uir  ||X  -  Vr||2  (9) 

r=l 
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where  Uir  =  Uir(x),r  =  1, 2, . . k  is  a  set  of  generalized  membership 
functions,  which  regulate  the  competition  between  the  prototypes,  vr  ,  r  = 
1,2,..., A;  for  the  input  x.  The  term  generalized  membership  functions  is 
used  to  indicate  that  their  form  can  be  selected  apriori  according  to  some 
intuitively  reasonable  criteria  [7]. 

The  development  of  genuinely  competitive  learning  vector  quantization 
algorithm  requires  the  selection  of  the  generalized  membership  functions 
assigned  to  the  prototypes.  A  fair  competition  among  the  prototypes  is 
guaranteed  if  the  generalized  membership  function  assigned  to  each  proto¬ 
type: 


•  is  invariant  to  the  magnitude  of  input  vectors. 

•  is  equal  to  unity  if  the  prototype  is  the  winner. 

•  takes  the  value  between  1  and  0  if  the  prototype  is  not  a  winner. 

•  approaches  zero  if  the  prototype  is  not  a  winner  and  its  distance  from 
the  input  vector  approaches  infinity  [7]. 

Some  fuzzy  algorithms  for  LVQ  are  described  below. 


2.3.1  FALVQ 

Assuming  x  is  the  input  vector  and  vz  is  the  winning  prototype,  i.e., 
||x-vt||2  <  ||x-vr||2  Vvr^Vi 

the  above  mentioned  conditions  are  satisfied  by  the  objective  function  de¬ 
fined  by  eqn.(9)  with 


r  i 


—  \ 


1 


1+ 


ll*~vrll2 

l|x-v,||2 


if  r  = 


if  t  ±  i 


(10) 


According  to  this  definition,  axr  decreases  from  a  value  close  to  ~  to  0  as 
||x  -  vr||2  increases  from  a  value  slightly  higher  than  ||x  -  v,j|2  to  infinity. 
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The  objective  function  J  is,  therefore: 


3  =  nx-viii2  +  a— ji^F)ii*-v,n2  (id 

1  +  ||x-v,|P 

The  FALVQ  updation  rules  are  derived  by  minimizing  the  above  objective 
function  using  the  gradient  descent  method  [7].  If  x  is  the  input  vector,  the 
winning  prototype  v;  can  be  updated  by  : 


k 

Avt-  =  a  (x-  v.)  (1  +  (1  -  uirf  ) 

r^-i 

While  the  non-winning  prototypes  Vj  ^  v;  can  be  updated  by  : 


AVj  =  a{x-vj)u2ij 


(12a) 


(126) 


The  adaption  of  the  prototype  during  the  learning  process  depends  on  the 
learning  rate  a  €  [0,1),  which  is  a  monotonically  decreasing  function  of 
the  number  of  iterations  t  defined  as  a  =  a(t)  =  ao(l  -  t/N),  where  a0 
is  the  initial  value  of  the  learning  rate  and  N  the  total  number  of  iterations, 
predetermined  for  the  learning  process. 


2.3.2  Other  Fuzzy  Algorithms 

As  seen  in  the  previous  section,  uir  =  1  if  vr  =  v,,  where  v;  is  the 
winning  prototype,  that  is,  ||x  —  v,j|2  =  minVjeV  ||x  -  y,-||2.  If  vr  ^  vt, 

then  >  1  Vvr  /  v,  and  therefore  utT  <  Since  ulT  €  (0,  |)  Vr  /  ? 

the  function  «,>  described  in  eqn.(10)  favours  rather  strongly  the  winning 
prototype  and  hence  there  is  a  bias  inherent  in  the  definition  of  uiT  towards 
the  winner.  So,  the  weight  of  ||x  -  vr||  in  J  lies  between  0  and  In  other 
words,  the  contribution  of  ||x  -  vr ||2  towards  J  is  restricted,  reducing  the 
competitive  effect  of  the  non- winner. 

The  non- winning  prototypes  can  be  made  more  competitive  by  introduc¬ 
ing  a  new  set  of  generalized  membership  functions  such  that,  if  \r  /  v,, 
Uir  takes  the  values  in  the  interval  (0,/?r),  where  >  i. 

This  is  done  by  introducing  a  new  set  of  generalized  membership  functions 
of  the  form 
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1 


(13) 


U{r  — 


1 


1+ 


Jix~vrll2 

D{  x  .v j  G  V ) 


if  r  =  i 
if  r  ^  i 


where  D(x,  vj  €  V)  is  a  differentiable  function  of  ||x  -  Vj||2,  Vj  €  V  such 
that  Z)(x,  Vj  £  V)  >  minVjev  ||x-Vj||2  where  mmVj€v||x-Vj||2  =  ||x- 
v;|J2,  if  Vi  is  the  winning  prototype.  in  eqn.(13)  can  also  be  written  as, 


U{r 


1 


i+ 


_ l _ 

llx-Vr»2  llx-vill2 

lix-vjlp  ‘D(x,VjeV) 


if  r  =  i 
if  r  ^  i 


Thus  the  value  of  j3r  depends  on  the  ratio  of  -J)x  . 

U  1.x, Vj  t  V  j 

The  three  existing  algorithms  [7]  using  this  concept  are  stated  below: 


Harmonic  FALVQ: 


Here 


1 _  _  1_ 

ic,vj€V)  —  k 


D(x,Vj €V)  Dx(x,vjev)  k  ^j= 1  ||x-Vj|P 

The  updation  rule  obtained  using  the  same  objective  function  as  (9)  are 


for  the  winning  prototype  and 


for  the  non- winning  prototypes  Vj  ^  v,-. 
Geometric  FALVQ: 


(14a) 


(146) 


Her e%vj6V)  =  L»G(x,Vi€V)  =  (  n)=i  IN  -  vyj|2  )i 
The  update  equations  are  : 

Avt-  =  a(x-v;)(l  +  \  ^Ml-tt,r)!|X~Vr.|2)  (15a) 

k  7*i  llx~  v,j|2 
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for  the  winner  prototype  and 


Avi  =  a(x-vj)( ufj  +  i  ^uir(l- wtr)|j|— ^|L)  (156) 


for  the  non- winner  prototype  \3  ^ 


Arithmetic  FALVQ: 

Here  £(x,v,  <=  V)  =  DA(x,Vj  €  V)  =  1  ||x-v,[|2 

The  update  equations  are: 

1  k 

Avi  =  a(x-Vi)(  1  +  -  ]^(1  -Uir)2) 

T^i 

for  the  winner  prototype  and 

AVj  =  a(x-Vj)(tt?-  +  l  2(!  -  ^>)2) 


r^i 


for  the  non- winner  prototype  Vj  /  vt- 


(16a) 


(166) 


The  analysis  of  the  algorithms  using  harmonic,  geometric  and  arithmetic 
mean  are  done  in  [7j.  In  both  the  cases  of  Harmonic  FALVQ  and  Geometric 
FALVQ,  updation  equations  are  such  that,  if  ||x  -  vr ||2  >  ||x  -  vt||2,  the 
updation  of  the  winning  prototype  towards  input  x  is  increased,  while  on 
the  other  hand  if  ||x  -  vr||2  «  ||x  -  vt||2  the  effect  of  the  input  x  on  the  win¬ 
ning  prototype  is  inhibited  i.e.  the  updation  of  the  winner  towards  x  is  de- 
creased.Again  uiT{l-uir)  in  eqn(15a)  is  an  increasing  function  if  0  <  uir  <  1 
and  decreasing  function  if  u,r  >  Since  in  the  geometric  FALVQ  utr  >  \ 
when  1 1 x  -  vr||2  is  sufficiently  close  to  ||x  -  Vi||2,  uir(\ -  uir)  decreases  as  the 
non-winning  prototype  vr  approaches  v,  , while  u2ir  in  Harmonic  FALVQ  is 
monotonically  increasing  function  of  uir .  Thus  the  non- winning  prototypes 
close  to  the  winning  prototype  vt  results  in  stronger  inhibition  of  its  adap¬ 
tion  when  the  Geometric  FALVQ  is  used  and  so  Geometric  FALVQ  results 
in  stronger  competition.  In  Arithmetic  FALVQ,  though  each  input  x  has 
a  stronger  effect  on  winner  than  the  non  winner(updation  of  the  winning 
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prototype  is  greater  than  the  updation  of  non-winner  prototype),  however 
difference  between  ufj  and  1  is  not  significant,  so  Arithmetic  FALVQ  can’t 
discriminate  between  prototypes  which  are  similar. 


3  Proposed  Method 

The  generalized  membership  functions  used  in  the  algorithms  HFALVQ, 
GFALVQ,  AFALVQ  are  of  the  form  shown  in  eqn.(13),  where  the  value  of 
Uir  lies  between  (0,/?r)  for  r  ^  i  and  (3r  >  The  value  of  (3r  depends  on 
the  ratio  of  •  However,  the  ratio  varies  from  problem  to  problem 

and  cannot  be  made  equal  to  zero.  So,  the  value  of  j3r  cannot  reach  1  and 
hence  there  always  exists  a  bias,  inherent  in  the  definition  of  wir.  Infact  U{r 
is  not  a  continuous  function  in  all  these  cases.  Again,  it  was  experimentally 
found  that  these  existing  algorithms,  though  work  sufficiently  well  for  the 
IRIS  data  set  ,  where  there  are  3  equal  classes  of  50  data  each,  they  fail 
for  the  data  set  where  unequal  sized  classes  present.  For  these  cases  the 
cluster  centers  produced  by  the  algorithms  are  very  much  different  from  the 
physical  cluster  centers. 

So  a  new  algorithm  has  to  be  developed  that  works  equally  well  for  the  data 
set  having  equal  sized  classes  and  at  the  same  time,  for  unequal  classes  pro¬ 
duce  a  better  cluster  centers  which  are  close  to  the  physical  cluster  centers. 

To  remove  the  bias  in  Uir  completely  a  new  set  of  generalized  meme- 
bership  functions  is  chosen  that  satisfies  the  four  conditions  for  a  genuinely 
competitive  learning  vector  quantization.  The  new  generalized  membership 
function  is  : 


U{r 


exp(  1  - 


II* 

ii*-v.-ir 


This  is  a  continuous  function  in  (0, 1], 

The  objective  function  J  is  taken  to  be  same  as  before: 

k 

J  =  5>r||*-tv||2 

r=l 


(17) 
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where  vl5  V2, . . . ,  v*  6  TZP  are  the  set  of  prototypes,  k  is  the  number  of  clus¬ 
ters  and  k  >  2 

The  update  equations  for  the  winning  prototypes  and  the  non- winning  proto¬ 
types  are  derived  by  minimizing  J  and  using  the  U{r  as  given  in  the  eqn(17). 

The  derivation  of  the  updation  equation  for  the  winning  prototype: 

k  k 

J  =  ^w,y||x  -  vr||2  =  Iix-vi||2  +  ^uir||x-vr||2 

r=l  r#  1 

Differentiating,  J  with  respect  to  the  winning  prototype  v,-  gives 
f£  =  -2(x-V,)+g|-(E;^«,>l|x-V,||2) 

=  -2(x-vi)  +  E^i(5|-«i-l|x-vr|p) 

=  -2(*  -  V.)  +  «*(  &(- §^#))||x  -  v,||J 

=  — 2(x  -  vf)  +  EkM^-p^pIMIx  -  v„r 

=  -2(x-Vi)-2E,‘#iv,v(J^f)2(x-v,) 

=  -2(l  +  E^iM|^)2)(x-v,) 
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Derivation  of  the  npdation  equation  ofthe  non-winning  prototype: 
Differentiating  J  with  respect  to  non-winning  prototypes  Vj,  j  ^  i, gives 

=  a!“(llx  ~  v*i|2  +  u»r||x  -  vr||2) 

=  af-Killx  -  Vill2  +  Zr&J  ttirllx  -  vr||2) 


=  (af7u«i)llx  -  vill2  +  %  af-(llx  -  v;||2) 


—  Uij{ 


d  X-Vj 


dv,  j|x  Vj 


Mlx“  vj||2-2Mx“vj) 


=  ^  ig-v^llx  -  vi  II2  -  2 -  Vj) 

=  Uij(x  -  Vi)(|^  -  1) 

So, the  updation  equation  of  the  winning  prototype  is: 


where 


or, 


Vi(t  +  1)  =  Vi(t)  +  Ay,- 


A  >d3 

Avt  =  -a'— 

OVi 


Avj  =  a'(l  +  y^«tv( 

r^i 


X  “  V. 


7*11  \2 


X  -  V, 


,112 


)  )(x  -  v<) 


The  updation  equation  for  the  non- winning  prototype  is: 

Vj(t  +  1)  =  Vj(t)  +  Avj 


where 


A  <d3 

Avi  ~a^~ 

OVi 


(18o) 


Here,  the  sign  is  taken  positive  to  make  the  updated  Vj  move  closer  to  x. 
So, 

.  ,  ,||x-v,-112 

A  Vj  =  a  Uij(- - - 


x  -  v 


■  112 


“  l)(x  —  vj) 


(186) 


where  a'  =  and  a  is  the  learning  rate. 
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The  comparison  between  the  updation  equation  of  the  winning  and  the 
non- winning  prototype  is  based  on  the  observation  that, 


r,l|2. 


=>  *o-(|=S|p)  <  |^#)2 

^  U*j(  llxlvfip  )  -  UV  <  uir{  jjx-v-||2  )2  +  1 

^  «tf((iWT)  “  1)  ^  1  +  Ylr^i  uir(  Jjx-vl||2  )2 


(19) 


Clearly,  eqn.(19)  indicates  that  the  input  vector  x  has  a  more  signifi¬ 
cant  effect  on  the  winning  prototype  i.e.  the  updation  for  the  winner  is 
greater  than  the  updation  for  the  non- winner. 

The  adaptation  of  the  winning  prototype  vt-  can  be  investigated  by  study¬ 
ing  the  term  uir{  |x-v-|)^  )2  *n  eqn.(18a)  which  represents  the  effect  of 
the  non-winning  prototypes.  Assume  that  vr  is  the  non-winning  prototype 
such  that  ||x-vr||2  >  ||x  —  vt||2.  According  to  the  defn.  of  utr,  in  this  case, 
when  r  /  i,  — *•  0.  Hence  M»v(  jj~~rfj5  )2  — *•  0.  So,  Avt  given  by  eqn.(18a) 
is  small.  However,  if  ||x  -  vr||2  «  ||x  -  v,||2,  then  utr  -►  1.  So,  Av,-  in¬ 
creases.  In  summary,  the  presence  of  vr,  such  that  ||x  —  vr||2  a  ||x  —  v;||2 
increases  the  updation  of  the  winning  prototype  towards  the  input  x,  while 
the  presence  of  vr,  such  that  ||x- vr||2  >  ||x- v,||2decreases  the  updation  of 
the  winning  prototype  towards  x.  This  method  of  competition  is  intuitively 
reasonable. 
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The  algorithm  can  be  summarized  as  folows: 


1.  Select  the  codebook  of  size  k ;  fix  qq,  N;  set  t  =  0; 

randomly  generate  the  initial  set  of  prototypes  V  =  (vi,  V2, . . . ,  v*) 

2.  Calculate  a  =  a0(l  - 

3.  For  each  input  vector  x  : 

•  find  ||x  -  v;||2  =  fnmVrev{||x  -  vr||2} 

•  evaluate  ulr  according  to  eqn.(17). 

•  update  winning  prototype  Vj  and  the  non-winning  prototypes 
Vj  7^  v,  according  to  eqn.(18a)  and  (18b)  respectively. 

4.  if  t  <  N  ,  then  t  =  t  +  1  and  go  to  step  2. 


The  performance  of  the  of  the  existing  algorithms  for  the  data  set  hav¬ 
ing  unequal  sized  classes  is  poor  (section  4).  The  cluster  centers  pro¬ 
duced  by  them  are  away  from  the  class  means.  This  is  because  the  up- 
dation  for  the  non-winner  prototypes  for  these  methods,  given  by  equations 
(12b), (14b), (15b)  and  (16b)  are  comparatively  large.  So,  in  the  initial  phase 
of  the  learning  process  the  total  updation  for  the  prototype  vector  of  the 
smaller  class,  as  the  non-winner(when  x  belonging  to  the  larger  class  is 
processed)  is  much  larger  than  the  total  updation,  as  the  winner(when  x 
corresponding  to  the  smaller  class  is  processed).  So  the  prototype  corre¬ 
sponding  to  the  smaller  class  is  being  pulled  away  from  the  physical  class 
mean,  towards  the  larger  class. 

We  show  below,  that  the  non-winner  updation  for  the  proposed  method 
is  comparatively  smaller  than  those  for  the  existing  methods,  thus  reducing 
the  displacement  of  the  cluster  center  for  the  smaller  class  from  its  physical 
class  mean. 

For  the  proposed  method  Mtj(jj^z^jpf  -  1)  in  eqn.(18b)  with  ui3  specified 
by  eqn.(17)  has  maximum  value  of  |  «  0.36.During  the  initial  phase  of  the 
learning  process,  ||x  -  vr||2  «  ||x  -  v*||2  and  so,  in  this  phase  the  value 
of  Uir  for  Harmonic  FALVQ,  Geometric  FALVQ  and  Arithmetic  FALVQ 
will  be  greater  than  |.  So,  in  A  Vj  given  by  eqn. (14b), (15b), (16b)  the  term 
uh  >  4  =  -25.  So,  in  many  cases,  it  happens  that  for  these  algorithms, 
the  value  of  the  term  contributing  for  the  updation  of  the  non-winner  is 
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greater  than  .36  or  approximately  that  is,  A \j  for  these  is  greater  than 
the  Avj  for  the  proposed  method.  Due  to  this,  the  non-winner  prototype 
will  be  pulled  more  towards  x  by  these  algorithms  than  the  proposed  one. 
In  FALVQ  U{r  £  (0,|).  So,  ufr  <  Hence  the  difference  between  Avj  for 
FALVQ  and  the  proposed  method  is  not  very  significant.  However,  as  al¬ 
ready  stated,  the  bias  present  in  the  definition  of  U{r  of  FALVQ  deteriorates 
the  performance  of  FALVQ. 

The  proposed  method  is  also  not  affected  by  scaling  of  data.  Hence  this 
is  scale  invariant  in  the  sense  of  the  following  proposition. 

Proposition  : 

Let  X  =  {xx,x2, . . . , xn}  C  TZP  and 

let  Vq  =  (vi,o,  v2io>  • .  v/^o)?  v2>o  €  Hp  be  a  set  of  initial  prototypes. 

Let  r  be  fixed  positive  number  and  define  sets  of  scaled  data  and  initial 
prototypes  by: 

Y  =  {yi,y2,...,yn}  =  tX  =  {rxi,rx2,...,rx„}  and 

W0  =  (wi,0)  w2,0,  ■  •  - ,  w*,0)  =  rV0  =  (tv1j0,  rv2i0, . . . ,  Tvkfi) 

Then  applying  the  proposed  method  to  X,  initialized  by  V0,  is  equivalent 
to  applying  the  proposed  method  to  Y,  initailized  by  W0,  in  the  sense  that 

wj,t  =  TVj,t  '■  3  =  1,2  and  t  =  0, 1, . . . 

Proof:  We  initially  show  that  the  membership  values  {ur}  for  X  and  Y  are 
identical,  because  they  involve  ratios  of  norm  values,  so  the  scaling  factor  r 
cancels  out,  just  as  in 


exp(  1  — 


lly- wr-||2 
Ily-WiH2 


)  =  exp( 1  — 


||rx  —  Tvr||2 
l|rx  —  rvjj|2 


=  exp(  1  - 


llx- Vr||2, 

I|X- V,||2  j 


We  use  induction  to  prove  the  proposition.  The  result  holds  for  t  =  0  by 
hypothesis.  We  now  assume  it  to  hold  for  arbitrary  t  —  1,  and  show  that  it 
holds  for  t. 
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By  eqn.(18a) 


w 


:,t  =  +  a'(l  +  ]T  uir(  Ily  Wr,<  li  )2)(y  -  W 

iiy  w»',<-iii 


for  the  winner  prototype. 

So, 

W i,t  =  +  “'C1  +  E r*i  )2)^X  “  TV*'-<-l) 

=  wm-i  +  «'(i  +  Er^«  M»>(l|xivl;tt:!^)2)r(x  - 

=  rvi.t.!  +  a'r(  1  +  £**,•  ^v(fegf)2)(x  -  vt) t_i) 

=  +  a' (l  +  £*«  MgE^f#)2)(x  -  v.-.i-r)] 


since  Uir  is  independent  of  r 


w  i)t  =  rvi,t 


By  eqn.(18b) 


“  1)(y  ‘  W"-‘ 1 


for  non-winner  prototype  j  ^  i 
So, 


wi,t  =  wj.t-i + ~  i)(rx  -  rvj.«-i) 

=  TVjj-i + ox^(|ni,;:11P  -  !)(x  - 
=  rK<-i  +  Q/«*j(|xIvj;)~1p  -  i)(x  -  vj,t~i)] 
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since  U{j  is  independent  of  r 


wJ,i  =  TVJ,t  f°r  J  #  i 


So,  for  each  ty 

W  i,t  =  TVij 
wj,t  =  rvjtt  j  ±  i 
Hence  the  proposition. 

4  Experimental  Results  and  Comparison 

In  order  to  judge  the  performance  of  the  algorithms,  we  have  used  the 
following  measures: 


•  number  of  misclassifications  where  the  classification  of  each  data  point 
is  known. 

•  a  measure  Z  (named  as  ’’total  distortion”)  which  is  defined  as: 

Z  =  XI  ^tVlIx  —  vr-l|2 

xGX  r“l 

.  (  Note  that  ti,y  =  1  for  r  =  i  for  all  the  algorithms.  When  r  /  i  each 
algorithm  provides  its  own  value  of  ut>.) 

4.1  Results  on  IRIS  data 

The  GLVQ,  FALVQ,  Harmonic  FALVQ,  Geometric  FALVQ,  Arithmetic 
FALVQ  and  the  proposed  algorithms  were  tested  using  Anderson’s  IRIS  data 
set,  which  has  extensively  been  used  for  evaluating  the  performance  of  the 
pattern  classification  algorithms[l].This  data  set  contains  150  feature  vec¬ 
tors  of  length  four,  which  belong  to  3  classes  representing  different  IRIS 
subspecies.  Each  class  contains  50  feature  vectors.  One  of  the  3  classes  is 
well  seperated  from  the  other  two,  which  are  not  easily  seperable  due  to  the 
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existence  of  similar  vectors.  The  performance  of  the  algorithms,  tested  on 
this  data  set  is  evaluated  by  counting  the  number  of  the  classification  errors, 
i.e.,  the  number  of  feature  vectors  that  are  assigned  to  a  wrong  cluster  by 
the  algorithms[6,7]. 

The  raw  IRIS  data  set  was  classified  by  the  GLVQ,  different  FALVQ 
algorithms,  and  the  proposed  algorithms  with  N  =  500  and  different  initial 
values  of  the  learnimg  rate.  Table  1  shows  the  corresponding  results. 

The  GLVQ,  FALVQ,  Harmonic  FALVQ,  resulted  in  16  or  17  classification 
error  when  they  are  applied  on  the  raw  IRIS  data  set.  This  is  typical  when 
the  IRIS  data  set  is  classified  by  the  unsupervised  algorithms.  The  proposed 
method  also  has  16  or  17  classification  errors,  with  the  data  set. Geometric 
FALVQ  has  a  slightly  better  performance,  as  it  misclassified  12  feature  vec¬ 
tors.  The  Arithmetic  FALVQ  algorithm  is  clearly  inferior  to  the  others, 
since  in  this  case  it  has  a  classification  error  23. 

Scaled  IRIS  data(IRIS/10)  is  also  used  for  testing  the  algorithms.  The 
GLVQ  algorithm  being  sensitive  to  scaling  of  data,  resulted  in  50  classifica¬ 
tion  errors  with  a  =  0.6.  While  the  FALVQ,  Harmonic  FALVQ,  Geometric 
FALVQ,  Arithmetic  FALVQ  are  scale  invariant.  The  proposed  algorithm, 
as  already  proved  is  also  scale  invariant.  Hence  their  performances  are  not 
effected  when  scaled  IRIS  data  is  used. 

The  performance  of  these  algorithms  are  also  tested  using  data  set  con¬ 
taining  unequal  sized  classes.  From  the  IRIS  data  set,  3  classes  are  formed, 
where  class  1,  class  2,  class  3  contain  50,  30,  10  feature  vectors  respectively 
taken  from  the  corresponding  class  1,  class  2,class  3  feature  vectors  of  the 
IRIS  data  set.  So  the  number  of  feature  vectors  in  this  data  set  is  90.  The  al¬ 
gorithms  are  tested  on  this  data  set  of  unequal  sized  class  with  N  =  500  and 
o0  =  0.05.  Proposed  method  along  with  FALVQ  gives  the  best  performance, 
as  it  misclassified  only  2  feature  vectors.  Harmonic  FALVQ  misclassified  4 
feature  vectors.  The  performance  of  the  Arithmetic  FALVQ  is  worst  since 
it  misclassified  26  feature  vectors.  Table  3  shows  the  corresponding  results. 


4.2  Results  on  Artificial  Data  Sets  generated  in  'R? 

The  performance  of  the  algorithms  are  also  tested  using 
Artificial  Data  sets.  The  artificial  data  set  contains  two  classes  which  are  of 
unequal  sizes.  Two  classes  are  generated  using  1000  points.  The  points  are 
uniformly  distributed  in  each  class.  The  class  1  has  an  apriori  probability  of 
0.8,  while  the  class  2  has  an  apriori  probability  of  0.2.  So  class  1  has  larger 
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number  of  points  than  the  class  2.  The  number  of  points  in  class  1  is  790, 
whereas  in  class  2  it  is  210.  Figures  3,  10  and  17  show  the  two  classes  with 
interclass  distances  0.5,  0.2  and  0.01  units  respectively.  The  class  1  has  a 
radius  of  2  unit  with  center  at  (0,0).  Class  2  has  a  radius  of  1  unit  with 
center  at  (3+interclass  distance,0).  The  physical  class  means  of  the  classes 
are  therefore,  their  centers. 

The  performances  of  the  algorithms  on  these  data  sets  are  shown  in  tables 
4,  5  and  6,  with  N  =  500  and  ao  =  0.005.  For  interclass  distance=0.5,  the 
proposed  method  classified  all  feature  vectors  correctly.  Also  for  interclass 
distances  0.2  and  0.01,  the  proposed  method  gives  the  best  performance  by 
misclassifying  4  and  22  feature  vectors  respectively.  As  can  be  seen  from  the 
tables  4,  5  and  6,  the  number  of  misclassifications  by  the  proposed  method  is 
less  than  the  other  algorithms.  For  all  these  data  sets,  the  next  best  perfor¬ 
mance  is  given  by  FALVQ  and  the  worst  performance  is  given  by  Arithmetic 
FALVQ.  Tables  7,  8  and  9  show  the  list  of  the  cluster  centers  obtained  by 
the  algorithms  for  the  artificial  data  sets,  with  N  =  500  and  a0  =  0.005. 
Figures.  24,  25  and  26  give  the  graphical  representation  of  the  same  cluster 
centers.  In  the  graph,  cluster  centers  produced  by  an  algorithm  is  marked 
by  a  number.  The  left  position  of  the  number  represents  the  center  for  the 
class  1,  while  the  right  position  is  the  center  for  class  2. 

It  can  be  seen  that,  for  the  existing  methods  the  cluster  center  for  class 
2  has  been  pulled  towards  the  larger  class,  while  the  proposed  algorithm 
obtains  the  cluster  center  for  class  2  closest  to  its  class  means. 

The  performance  of  the  algorithms  is  also  tested  using  the  distortion 
measure  Z.  Table  10  shows  the  total  distortion  obtained  by  different  algo¬ 
rithms  on  different  data  sets.  The  values  are  obtained  after  N  =  500  and 
with  ao  =  0.005.  The  distortion  obtained  using  the  proposed  algorithm  is 
minimum  among  all  algorithms,  for  each  of  the  data  set. 

4.3  Results  on  IRS  Imagery 

The  performance  of  the  proposed  algorithm  is  tested  on  IRS  Imagery. 
IRS  stands  for  Indian  Remote  Sensing  Satellite.  The  data  used  for  this  work, 
is  taken  from  the  satellite  IRS- IB.  The  satellite  is  equipped  with  2  different 
sensors-LISS  I  and  LISS  II.  Data  used  for  this  work  is  from  LISS  II  sensor. 
LISS  II  has  a  focal  length  of  324.4  meter  with  a  spectral  range  between  0.45 
-0.86  micrometer.  The  whole  spectral  range  has  been  divided  into  4  bands, 
namely  Blue  (0.45-0.52  /m),  Green  (0.52-0.59  //m),  Red  (0.62-0.68 /im), 
Infrared  (0.77  -  0.86/um). 
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The  scene  used  for  evaluating  the  performance  of  the  algorithm  is  Cal¬ 
cutta  scene.  256  x  256  image  for  each  of  the  four  bands  are  taken.  Figures 
27,  28,  29  and  30  show  the  corresponding  Band  1,  Band  2,  Band  3,  Band  4 
images.  The  region  primarily  consists  of  6  different  types  of  landcovers.  The 
6  classes  are  Clear  water,  Turbid  Water,  Concrete  Structures,  Habitation, 
Vegetation  and  Open  space. 

The  constituents  of  these  classes  are  described  below. 

1.  Pure  Water:  This  class  contains  pond  water. 

2.  Turbid  Water:  This  class  contains  rivers. 

3.  Concrete:  This  class  contains  buildings,  railway  lines,  roads. 

4.  Habitation:  This  class  basically  consists  of  suburban  and  rural  habi¬ 
tation  i.e.  concrete  structures  but  comparatively  less  in  density. 

5.  Vegetation:  This  class  represents  the  crop  area  and  the  forest  area. 

6.  Open  Space:  This  class  contains  Barren  land  ,  sand. 

The  proposed  algorithm  is  used  for  clustering  the  pixels  in  this  IRS  image, 
with  number  of  clusters  k  taken  to  be  6,5,4  and  3.  The  best  results  are 
obtained  with  k  =  3.  Reconstructed  image  with  fc  =  3  is  shown  in  the 
fig.31.  Each  of  the  3  classes  in  this  image  is  shown  seperately  in  Figures  32, 
33  and  34.  It  has  been  possible  to  label  2  clusters  among  3  clusters  in  the 
images.  These  2  classes  which  can  be  clearly  identified  are  Water  and  Land, 
with  the  third  class  consisting  of  very  few  pixels  (noise  pixels).  The  results 
with  k  =  4,  k  =  5,  k  =  6  are  however  not  satisfactory. 


5  Conclusions 

Here  we  have  presented  a  new  fuzzy  learning  vector  quantization  algo¬ 
rithm.  The  algorithm  uses  a  membership  function  which  is  continuous  on 
(0, 1]  and  hence,  unlike  the  other  algorithms,  it  removes  the  inherent  bias 
towards  the  winner.  This  helps  in  increasing  the  competitive  effect  among 
the  prototypes.  The  large  non-winner  updation,  which  was  present  in  other 
algorithms,  is  also  eliminated.  Hence  the  performance  of  this  algorithm  for 
a  data  set  having  unequal  sized  classes  is  much  better.  The  algorithm  is 
tested  using  different  number  of  iterations,  and  also  different  learning  rates. 
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The  results  obtained  in  all  these  cases  are  same.  Total  distortion  obtained 
by  this  algorithm  is  also  the  least.  The  proposed  algorithm  assumes  that 
the  number  of  clusters  present  in  the  data  set  is  greater  than  equal  to  2. 
Experiments  are  conducted  with  data  sets  which  are  non-overlapping,  where 
the  algorithm  performs  better  than  all  other  existing  methods.  Unlike  the 
previous  algorithms,  the  proposed  algorithm  requires  slightly  more  compu¬ 
tations  as  it  requires  to  compute  exponential  membership  functions. 

A  possible  way  for  the  improvement  in  all  these  algorithms  is  to  find 
the  concrete  mathematical  setup  with  theorems  and  proofs  which  judge  the 
performance  of  the  algorithms  and  use  the  results  for  further  modifications. 
This  however  is  is  not  attempted  here. 


6  Tables  and  Figures 


TABLE  1:  Performance  of  the  algorithms  on  the  IRIS  data  set.  N=500. 


Algorithm 

Classification  Errors 

GLVQ 

0.5 

17 

0.6 

17 

0.05 

17 

FALVQ 

0.05 

17 

0.005 

16 

Harmonic  FALVQ 

0.05 

16 

0.005 

16 

Geometric  FALVQ 

0.05 

12 

0.005 

12 

Arithmetic  FALVQ 

0.05 

23 

0.005 

23 

Proposed  Method 

0.05 

17 

0.005 

16 

23 


TABLE  2:  Performance  of  the  algorithms  on  the  scaled  IRIS 
data(IRIS/10),  N=500. 


Classification  Errors 

GLVQ 

0.6 

50 

FALVQ 

0.005 

16 

IIFALVQ 

16 

GFALVQ 

0.005 

12 

A  FALVQ 

0.005 

23 

Proposed  Method 

0.005 

17 

TABLE  3:  Performance  of  the  algorithms  on  3  unequal  sized  classes  with 
feature  vectors  taken  from  IRIS  data  set.  N=500. 

Class  1  contains  50  vectors. 

Class  2  contains  30  vectors. 

Class  3  contains  10  vectors. 

Qo  =  0.005 


Algorithms 

Classification  Error 

GLVQ 

2 

FALVQ 

2 

Harmonic  FALVQ 

4 

Geometric  FALVQ 

18 

Arithmetic  FALVQ 

26 

Proposed  Method 

2 

TABLE  4:Performance  of  the  algorithms  on  on  artficial  data  set  with  2 
classes  of  unequal  sizes.  Interclass  distance  =  0.5,  a0  =  0.005,  N  =  500. 


Algorithms 

Classification  Errors 

GLVQ 

22 

FALVQ 

22 

Harmonic  FALVQ 

32 

Geometric  FALVQ 

62 

Arithmetic  FALVQ 

128 

Proposed  Method 

0 

24 


TABLE  5:  Performance  of  the  algorithms  on  artificial  data  set  with  2 
classes  of  unequal  sizes,  Interclass  distance  =  0.2,  ao  =  0.005,  N=500. 


Algorithms 

Classification  Errors 

GLVQ 

47 

FALVQ 

45 

Harmonic  FALVQ 

53 

Geometric  FALVQ 

99 

Arithmetic  FALVQ 

150 

Proposed  Method 

4 

TABLE  6:  Performance  of  the  algorithms  on  artificial  data  set  with  2 
classes  of  unequal  sizes,  Interclass  distance  =  0.01,  Oq  =  0.005,  N=500. 


Algorithms 

Classification  Errors 

GLVQ 

92 

FALVQ 

74 

Harmonic  FALVQ 

85 

Geometric  FALVQ 

11 

Arithmetic  FALVQ 

167 

Proposed  Method 

22 

TABLE  7:  Actual  class  centres  and  cluster  centres  produced  by  different 
algorithms  on  the  artificial  data  set  with  interclass  distance  =  0.5, 
a0  =  0.005,  N  =  500. 


Algorithms 

Cluster  centers 

cluster  1 

cluster  2 

GLVQ 

(-0.038,0.027) 

(3.24,0.027) 

FALVQ 

(-0.064,0.014) 

(3.27,0.028) 

Harmonic  FALVQ 

(-0.119,0.011) 

(3.164,0.029) 

Geometric  FALVQ 

(-0.058,0.002) 

(2.62,0.026) 

Arithmetic  FALVQ 

(0.14,-0.003) 

(1.628,0.051) 

Proposed  Method 

(0.18,0.02) 

(3.81,0.022) 

Actual  Class  Center 

(0,0) 

(3.5,0) 

25 


Table  8  :  Actual  class  centers  and  cluster  centers  produced  by  different 
algorithms  on  the  artificial  data  set  with  inter  class  distance  =  0.2, 

N  =  500,  c*o  =  0.005 


Algorithms 

Cluster  centers 

cluster  1 

cluster  2 

GLVQ 

(-0.087,0.013) 

(2.81,0.037) 

FALVQ 

(-0.108,0.011) 

(2.88,0.033) 

Harmonic  FALVQ 

(-0.15,0.0108) 

(2.79,0.03) 

Geometric  FALVQ 

(-0.086,-0.004) 

(2.25,0.044) 

Arithmetic  FALVQ 

(0.10,-0.004) 

(1.49,0.050) 

Proposed  Method 

(0.18,0.02) 

(3.48,0.002) 

Actual  Class  Centers 

(0,0) 

(3.2,0) 

Table  9  :  Actual  Class  centers  and  cluster  centers  produced  by  different 
algorithms  on  the  artificial  data  set  with  inter  class  distance  =  0.01, 

N  =  500,  a0  =  0.005 


Algorithms 

Cluster  centers 

cluster  1 

cluster  2 

GLVQ 

(-0.168,-0.021) 

(2.44,0.085) 

FALVQ 

(-0.154,0.002) 

(2.6,0.052) 

Harmonic  FALVQ 

(-0.18,0.003) 

(2.53,0.04) 

Geometric  FALVQ 

(-0.09,-0.007) 

(2.07,0.05) 

Arithmetic  FALVQ 

(0.07,0.005) 

(1.404,0.036) 

Proposed  Method 

(0.131,0.019) 

(3.12,0.010) 

Actual  Class  centers 

(0,0) 

(3.01,0) 

Table  10  :Distortion  produced  by  different  algorithms,  N=500,  a0  =  0.005, 
6  is  the  interclass  distance  in  the  artificial  data  set. 


Algorithms 

IRIS 

Artificial  data  set 

6  =  0.5 

6  =  0.2 

6  =  0.01 

GLVQ 

225.4 

2073.96 

2033.79 

1997.93 

FALVQ 

216.48 

2131.24 

2065.13 

2010.04 

Harmonic  FALVQ 

369.21 

2648.21 

2527.72 

2438.23 

Geometric  FALVQ 

752.97 

3446.47 

3157.055 

2978.22 

Arithmetic  FALVQ 

1162.73 

4024.84 

3620.17 

3382.28 

Proposed  Method 

102.88 

1360.21 

1409.49 

1432.13 

26 


Table  11:  Number  of  misclassifications  produced  by  different  algorithms  on 
different  number  of  iterations,  ao  =  0.005, 


Algorithms 

Number  of  Iterations,  N 

N=300 

N=500 

N=800 

GLVQ 

22 

22 

22 

FALVQ 

22 

22 

22 

Harmonic  FALVQ 

32 

32 

32 

Geometric  FALVQ 

62 

62 

62 

Arithmetic  FALVQ 

128 

128 

128 

Proposed  Method 

0 

0 

0 

Table  12:  Number  of  misclassifications  produced  by  different  algorithms  on 
different  number  of  learning  rates,  N=500. 


Algorithms 

learning  rate,  ao 

a0  =  0.003 

a0  =  0.005 

ao  =  0.007 

GLVQ 

22 

22 

22 

FALVQ 

22 

22 

22 

Harmonic  FALVQ 

32 

32 

32 

Geometric  FALVQ 

62 

62 

62 

Arithmetic  FALVQ 

128 

128 

128 

Proposed  Method 

0 

0 

0 

Table  13:  Number  of  misclassifications  on  different  initializations,  N=500, 

ao  =  0.005. 


Algorithm 

Trial 

#1 

#2 

#3 

#4 

#5 

Proposed  Method 

0 

0 

0 

0 

0 
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Fig  2.  Updating  the  winning  LVQ  Prototype 


Fig  3:  Artificial  data  set  with  two  classes  of 
unequal  size. 

interclass  distance  =  0.5 
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Fig  4:  Perfornance  of  GLUQ  algorithm, 
interclass  distance  =  0,5 
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Fig  5:  Perfornance  of  FALUQ  algorithm. 
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Fig  <5:  Perfornance  of  Harnonic  FALUQ  algorithn. 
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Fig  1 1 :  Perfornance  of  GLUQ  algor ithn. 
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Fig  12!  Perfornance  of  FflLUQ  a  Igor ithn. 
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Fig  13:  Performance  of  Harmonic  FALUQ  algorithm. 


interclass  distance  r  0.2 


Fig  14:  Performance  of  Geometric  FALUQ  algorithm. 
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Fig  15:  Performance  of  arithmetic  FALUQ  algorithm 


interclass  distance  =  0.2 


Fig  17:Artificial  data  set  with  two  classes 
of  unequal  size. 

interclass  distance  r  0.01 


Fig  18:  Perfornance  of  GLUQ  algorithm 


in tare lass  distance  =  0.01 


Fig  19:  Performance  of  FALUQ  algor ithn. 
interclass  distance  r  0.01 
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Fig  21:  Perfornance  of  Geonetric  FALUQ  algorithm, 
interclass  distance  :  0.01 
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Fi 9  23:  Perforriance  of  the  proposed  algorithm. 
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Fig.  24  : 

Cluster  centers  produced  for  artificial  data  set  by  different  algorithm 
Interclass  distance  =  0.5  ~ ~~~ 

Physical  cluster  center  :  class  1  -  <0,0>  class  2  -  <3.S,0> 


Fi9.  25  : 

Cluster  centers  produced  for  artificial  data  set  ba  different  algorithm 
Interclass  distance  =0.2 
Physical  cluster  center  :  class  1 
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Fig.  26  : 

Cluster  centers  produced  for  artificial  data  set  by  different  algorithns 
Interclass  distance  r  0.01 

Physical  cluster  center  :  class  1  —  <0,0>  class  2  —  <3.01,0> 


Fig.  27:  Original  256  x  256  Band-1  Image  Fig.  28:  Original  256  x  256  Band-2  Image 


Fig.  29:  Original  256  x  256  Band-3  Image  Fig.  30:  Original  256  x  256  Band-4  Image 


Fig.  33:  "Land"  class  obtained  with  l<=3 


Fig.  J-l:  Noise  obtained  in  the  third  class  with  k~3 
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